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Description 

[0001] The present invention relates to a processor 
architecture, and more particularly to a processor archi- 
tecture for handling variable bit width data. 
[0002] Processors generally process a single instruc- 
tion of an instruction set in several steps. Early technol- 
ogy processors performed these steps serially. Advanc- 
es in technology have led to pipelined-architecture proc- 
essors, alsocalled scalar processors, which perfomn dif- 
ferent steps of many instructions concurrently. A "super- 
scalar" processor is implemented using a pipelined 
structure, but improves performance by concurrently ex- 
ecuting scalar instructions. 

[0003] In a superscalar processor, instnjction con- 
flicts and dependency conditions arise in which an is- 
sued instruction cannot be executed because data or 
resources are not available. For example, an issued in- 
struction cannot execute when its operands are depend- 
ent upon data calculated by other nonexecuted instruc- 
tions. 

[0004] Superscalar processor perfonmance is im- 
proved by the speculative execution of branching in- 
structions and by continuing to decode instnjctions re- 
gardless of the ability to execute instructions immedi- 
ately. Decoupling of instruction decoding and instruction 
execution requires a buffer between the processor's in- 
struction decoder and functional units that execute in- 
structions. 

[0005] Performance of a superscalar processor is al- 
so improved when multiple concurrently-executing in- 
structions are allowed to access a common register. 
However, this inherently creates a resource conflict. 
One technique for resolving register conflicts Is called 
"register renaming". Multiple temporary renaming regis- 
ters are dynamically allocated, one for each instruction 
that sets a value for a permanent register. In this man- 
ner, one permanent register may serve as a destinatipn 
for receiving the results of multiple instructions. These 
results are temporarily held in the multiple allocated 
temporary renaming registers. The processor keeps 
track of the renaming registers so that an instruction that 
receives data from a renaming register accesses the ap- 
propriate register. This register renaming function may 
be implemented using a reorder buffer which contains 
temporary renaming registers. 
[0006] Many existing processors run a large base of 
computer programs but are limited in perfomriance. To 
improve instruction throughput in such processors, it 
may be desirable to incorporate superscalar capabilities 
therein. W.M. Johnson in Superscalar Processor De- 
sign, Englewood Cliffs, N.J,, Prentice Hall, 1991, p. 
261-272, discusses such a superscalar implementation. 
[0007] For example, a family of processors, called the 
x86 family, have been developed including 8086, 80286, 
80386, 80486 and Pentium^M (Intel Corporation, Santa 
Clara, CA.) processors. Advantageously, x86 proces- 
sors are backward compatible. The newest processors 



run the same programs as older processors. x86 proc- 
essors are considered to employ a complex-instruction - 
set-computer (CISC) architecture, in which many differ- 
ent densely-coded instructions are implemented. 
s [0008] A variety of techniques have been used in the 
x86 family to implement backward compatibility. These 
techniques have made the implementation of register 
renaming very difficult. For example, the x86 instruction 
set uses registers for which at least a subset of bits oyer- 

10 lap the bits of another register, such as word registers 
that overlap double word registers and byte registers 
that overlap word and doubleword registers. As x86 
processors evolved from 8 to 1 6-bit and then to 32-bit 
processors, the register architecture similariy evolved 

IS into a form in which 8-bit general registers AH and AL. 
respectively, comprise the high and low bytes of a 1 6-bit 
general register AX. AX, in turn, includes the low order 
1 6 bits of a 32-bit extended general register EAX. B, C 
and D registers are similarly constrained. These regis- 

20 ters are supplemented by additional register pairs: ESI: 
SI, EDhDI, ESP:SP and EBP:BP, having low order bits 
of the 32-bit extended (E) doubleword registers over- 
lapped by 1 6-bit'Word registers. In addition. x86 proces- 
sors have an extensive and complicated instruction set 

25 that introduces additional complexity so that some in- 
struction opcode fields that specify overiapping regis- 
ters for some data widths also specify nonoverlapping 
registers of other data widths. 
[0009] If registers cannot be renamed, register ac- 

30 cess conflicts are resolved only by having one instruc- 
tion cede control to another, delaying the dispatch of an 
instruction until the instruction is free of dependencies 
and causing stalling of the parallel dispatching of in- 
structions in the processor pipeline. This causes serial 

35 operation of instructions that are intended to be execut- 
ed in parallel. 

[001 0] . Because the x86 architecture includes a small 
number of registers (eight), frequent register reusage is 
encouraged for asuperscalarprocessorthat is intended 

^0 to execute instructions in parallel. It is thus desirable to 
allow register reusage, perhaps by employing register 
renaming. Unfortunately, the overlapping nature of x86 
instructions limits the renaming of overlapped registers 
for resolving mutual data dependencies. Register re- 

^ naming is impeded because, although the overlap rela- 
tionship of registers is known and invariable and thus 
predictable, architectural and code-compatibility con- 
straints require that the registers be considered inde- 
pendent entities. Thus, although register renaming 

50 could resolve register resource conflicts in an x86 proc- 
essor, the x86 architecture substantially limits register 
renaming, 

[0011] It is fundamental to achieving a perfonnance 
improvement using parallel processing that instructions 
55 be dispatched regularly and rapidly. When dispatching 
of instructions is stalled awaiting execution of another 
instruction, the processor perfomns only as well as a se- 
rial processor. 
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[0012] Accordingly, it is desirable in a superscalar or 
pipelined processor to provide an Improved processor 
architecture that supports renaming of overlapping reg- 
ister bit fields. Also, it is desirable In a superscalar or 
pipelined processor to provide dependency checking 
and fonwarding of overlapping register bit fields. We 
shall describe a processor that employs partial register 
renaming to achieve dependency checking and result 
forwarding of overlapping data fields. 
[0013] This invention is as set out in claims 1 and 7. 
[0014] Embodiments of the present invention solye 
problems arising from parallel processing of overlapping 
- data structures by providing an apparatus and method 
for perfomriing operations upon variable size operands 
which include characterizing the variable-sized oper- 
ands of each operation as several fields within a full size 
operand. Each field is designated as defined or unde- 
fined with respect to the instruction. The apparatus and 
method also detemiine whether the operation is data 
dependent upon the execution of another operation for 
each defined field of the variable size operand, inde- 
pendently of the other fields. When data becomes avail- 
able, variable size data are fonvarded for utilization in 
the operation for each defined field independently of the 
other fields. 

[001 5] A further embodiment of the present invention 
solves problems arising from parallel processing of 
overlapping data structures by providing a processor for 
perfomning operations utilizing operand data of a varia- 
ble size. The processor includes an instruction decoder 
for decoding instructions that utilize variable size oper- 
ands which are partitioned from a full size operand into 
several fields. The decoder designates each field as de- 
fined or undefined with respect to the instruction. The 
processor also includes a reorder buffer for temporarily 
storing control infomiatton and operand data relating to 
the operation and for detemnining whether variable size 
operand data utilized by the operation are. available. 
Availability is detennined for each defined field inde- 
pendently of the other fields. A plurality of functional 
units are supplied that execute operations and generate 
variable-sized result data for each defined field inde- 
pendently of the other fields. Several busses are includ- 
ed for accessing variable size operand data for utiliza- 
tion by a functional unit which executes an operation 
when the data upon which it is dependent becomes 
available. Availability of data is tested for each defined 
field independently of the other fields. 
[0016] In the accompanying drawings, by way of ex- 
ample only: 

Figure 1 is a architecture-level block diagram of a 
processor which Implements dependency checking 
and forwarding of variable width operands; 

Figures 2, 3, 4, 5 and 6 are tables that depict mul- 
tiple bit fields within an operand utilized by opera- 
tions perfomned by the processor of Figure 1 ; 



Figure 7 depicts control bits of a pointer which se- 
lects a register of a register file; 

Figure 8 is an architecture-level block diagram of a 
s register file within the processor of Figure 1 ; 

Figure 9 is a pictorial illustration of a memory format 
in the register file shown in Figure 8; 

10 Figure 10 is an architecture-levei block diagram of 
a reorder buffer within the processor of Figure 1 ; 

Figure 1 1 is a pictorial illustration of a memory for- 
mat within the reorder buffer of Figure 10; 

15 

Figure 12 is a table that depicts dispatching of an 
exemplary sequence of instructions using a naive 
implementation of register renaming; 

20 Figure 13 is a table that depicts dispatching of an 
exemplary sequence of instructions using a pre- 
ferred implementation of register renaming; 

Figure 1 4 is an architectural -level block diagram of 
25 a generic functional unit which illustrates input and 
output handling of such a unit; and 

Figure 15 is a pictorial illustration of a FIFO format 
within a reservation station of the functional unit of 
30 Figure 14; 

Figure 16 is an architectural-level block diagram of 
a load / store functional unit within the processor of 
Figure 1. 

35 

[0017] The architecture of a superscalar processor 10 
having an instruction set for executing integer and float- 
ing point operations is shown in Figure 1 . A 64-bit inter- 
nal address and data bus 11 communicates address, 

40 data, and control transfers among various functional 
blocks of the processor 1 0 and an external memory 1 4. 
An instruction cache 16 parses and pre-decodes CISC 
instructions. A byte queue 35 transfers the predecoded 
instructions to an instruction decoder 18, which maps 

^ the CISC instructions to respective sequences of in- 
structions for RISC-like operations ("ROPS"). The in- 
struction decoder 1 8 generates type, opcode, and point- 
er values for all ROPs based on the pre-decoded CISC 
instructions in the byte queue 35. 

50 [0018] A suitable instruction cache 16 is described In 
further detail in published EP-A-0 651 322. A suitable 
byte queue 35 is described in additional detail in pub- 
lished EP-A-0 651 324. A suitable instruction decoder 
1 8 is described in further detail in published EP-A-0 651 

55 320. 

[0019] The instmction decoder 18 dispatches ROP 
operations to functional blocks within the processor 10 
over various busses. The processor 10 supports four 
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ROP issue, five ROP results, and the results of up to 
sixteen speculatively executed ROPS. Up to four sets 
of pointers to the A and B source operands and to a des- 
tination register are furnished over respective A-oper- 
and pointers 36, B-operand pointers 37 and destination 
register pointers 43 by the instruction decoder 1 B to a 
register file 24 and to a reorder buffer 26. The register 
file 24 and reorder buffer 26 in turn furnish the appropri- 
ate "predicted executed" versions of the RISC operands 
A and B to various functional units on four pairs of 41 -bit 
A-operand busses 30 and B-operand busses 31. Asso- 
ciated with the A and B-operand busses 30 and 31 are 
operand tag busses, including four pairs of A-operand 
tag busses 48 and B-operand tag busses 49. When a 
result is unavailable for placement on an operand bus, 
a tag that identifies an entry in the reorder buffer 26 for 
receiving the result when it becomes available is loaded 
onto a corresponding operand tag bus. The four pairs 
of operand and operand tag busses correspond to four 
ROP dispatch positions. The instruction decoder, in co- 
operation with the reorder buffer 26. specifies four des- 
tination tag busses 40 for identifying an entry In the re- 
order buffer 26 that will receive results from the func- 
tional units after an ROP is executed. Functional units 
execute an ROP, copy the destination tag onto one of 
five result tag busses 39, and place a result on a conre- 
sponding one of five result busses 32 when the result is 
available. A functional unit directly accesses a result on 
result busses 32 when a corresponding tag on result tag 
busses 39 matches the operand tag of an ROP awaiting 
the result. 

[0020] The instruction decoder 1 8 dispatches opcode 
infomnation, including an opcode and an opcode type, 
that accompanies the A and B source operand infomna- 
tion via four opcode / type busses 50. 
[0021] Processor 10 includes several functional units, 
such as a branch unit 20, an integer functional unit 21, 
a floating point functional unit 22 and a load / store func- 
tional unit 80. Integer functional unit 21 is presented in 
a generic sense and represents units of various types, 
such as arithmetic logic units, a shift unit and a special 
registers unit. Branch unit 20 executes a branch predic- 
tion operation, a technique which allows an adequate 
instruction-fetch rate in the presence of branches and is 
needed to achieve perfonnance with multiple instruction 
issue. A suitable branch prediction system, including a 
branch unit 20 and instruction decoder 18, is described 
in further detail in United States Patent No. 5,136,697 
(William M. Johnson "System for Reducing Delay for Ex- 
ecution Subsequent to Correctly Predicted Branch In- 
struction Using Fetch Infonnation Stored with each 
Block of Instructions in Cache"), 
[0022] Processor 10 is shown having a simple set of 
functional units to avoid undue complexity. It will be ap- 
preciated that the number and type of functional units 
are depicted herein in a specified manner, with a single 
floating point functional unit 22 and multiple functional 
units 20, 21 and 22 which generally perfomn operations 



on integer data, but other combinations of integer and 
floating point units may be implemented, as desired. 
Each functional unit 20, 21, 22 and 80 has respective 
reservation stations (not shown) having inputs connect- 
5 ed to the operand busses 30 and 31 and the opcode / 
type busses 50. Reservation stations allow dispatch of 
speculative ROPs to the functional units. 
[0023] Register file 24 is a physical storage memory 
including mapped CISC registers for integer and floating 

10 point instructions. Register file 24 is addressed by up to 
two register pointers of the A and B-operand pointers 
36 and 37 which designate a register number for source 
operands for each of up to four concun-ently dispatched 
ROPs. These pointers point to a register file .entry and 

IS the values in the selected entries are placed onto oper- 
and busses of the operand busses 30 and 31 through 
eight read ports. Integers are stored in 32-blt <31:0> 
registers and floating point numbers are stored in 82-bit 
<81:0> floating point registers of the register file 24. 

20 Register file 24 receives integer and floating point re- 
sults of executed and nonspeculative operations from 
the reorder buffer 26 over four 41 -bit writeback busses 
34. Results that are written from the reorder buffer 26 to 
the register file 24 as ROPs are retired. 

25 [0024] Reorder buffer 26 is a circular FIFO for keeping 
track of the relative order of speculatively executed 
ROPs. The reorder buffer 26 storage locations are dy- 
namically allocated for sending retiring results to regis- 
ter file 24 and for receiving results from the functional 

30 units. When an instruction is decoded, its destination op- 
erand is assigned to the next available reorder buffer 
location, and its destination register number is associ- 
ated with this location, in effect renaming the destination 
register to the reorder buffer location. The register num- 

35 bers of its source operands are used to access reorder 
buffer 26 and register file 24 simultaneously. If the reor- 
der buffer 26 does not have an entry whose destination 
pointer matches the source operand register number, 
then the value in the register file 24 is selected as the 

40 operand. If reorder buffer 26 does have one or more 
matching entries, the value of the most receritly allocat- 
ed matching entry is furnished if it is available. If the re- 
sult is unavailable, a tag identifying this reorder buffer 
entry is furnished on an operand tag bus of A and B- 

45 operand tag busses 48 and 49. The result or tag is fur- 
nished to the functional units over the operand busses 
30, 31 or operand tag busses 48, 49, respectively. When 
results are obtained from completion of execution in the 
functional units 20, 21, 22 and 80, the results and their 

50 respective result tags are fumished to the reorder buffer 
26, as well as to the reservation stations of the functional 
units, over five bus-wide result busses 32 and result tag 
busses 39. of the five result and result tag and status 
busses, four are general purpose busses for forwarding 

55 integer and floating point results to the reorder buffer. 
Additional fifth result and result tag and status busses 
are used to transfer infonnation, that is not a forwarded 
result, from some of the functional units to the reorder 
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buffer. For example, status information arising from a 
store operation by the load / store functional unit 80 or 
from a branch operation by the branch unit 20 Is placed 
on the additional busses. The additional busses con- 
serve result bus bandwidth. 

Reorder buffer 26 handles exceptions and mispredic- 
tions, and maintains the state of certain registers, includ- 
ing a program counter and execution flags.- A suitable 
unit for a RISC core, including a reorder buffer is dis- 
closed in our published EP-A-0 651 321. 
[0025] The instruction decoder 18 dispatches ROPs 
"in-order" to the functional units. The order is maintained 
in the order of reorder buffer entries. The functional units 
queue ROPs for Issue when all previous ROPs in the 
queue have completed execution, all source operands 
are available either via the operand busses or result 
busses, and a result bus is available to receive a result. 
Thus, functional units complete ROPs "cut-of-order". 
The dispatch of operations does not depend on the com- 
pletion of the operations so that, unless the processor 
is stalled by the unavailability of a reservation station 
queue or an unallocated reorder buffer entry, instruction 
decoder 18 continues to decode instructions regardless 
of whether they can be promptly completed. 
[0026] It is preferable to define a data path having at 
least 32 bits of width for handling integer data. The data 
path includes registers in the register file 24 and a result 
field in each entry of the reorder buffer 26, as well as the 
operand, result and writeback busses. In one embodi- 
ment, the processor has a 41 -bit data path to accom- 
modate floating point operations. The 32-bit data path 
is mapped into bits <31 :0> of a 41 -bit structure. 
[0027] A suitable load/store functional unit is dis- 
closed in our published EP-A-0 679 992. 
[0028] Multiple data types are represented by the 
32-bit integer data stnjcture depicted in Figures 2, 3, 4, 
5 and 6. Data structure 200, shown in Figure 2, is par- 
titioned into three fields - a 15-bit high field 21 7, an 8-bit 
middle field 21 6 and an 8-bit low field 21 5. A doubleword 
202 shown in Figure 3 represents a 32-bit integer data 
element that uses all the bits of the low, middle and high 
fields of the stmcture 200. A word-204 shown in Figure 
4 represents a 1 6-bit integer element that uses all bits 
of the low and middle fields. Bytes 206 and 208, shown 
respectively in Figures 5 and 6, represent 8-bit integer 
elements that employ either all bits of the low field to 
define a low byte, or all bits of the middle field to define 
a high byte. Unused bits of data elements that are small- 
er than 32 bits are generally set to zero by various func- 
tional blocks within the processor 10. 
[0029] Each A-operand pointer, B-operand pointer 
and destination register pointer is encoded in nine bits, 
as is shown by the pointer 210 of Figure 7. The high 
order six bits <8:3> of the pointer 21 0 specify a register 
address which selects a particular register within the 
register file 24 that is operated upon by an ROP The 
low order three bits (f-l. M and L) are field .select bits 
which specify the fields of the register that are defined 



to be utilized by the ROP The L bit selects the low bit 
field 215 of the data structure 200 of Figure 2. The M 
bit selects the middle bit field 216 and the H bit selects 
the high bit field 217. In this manner, the pointer 210 
5 supports selection of a bit field, independently of the se- 
lection of the other bit fields. 

[0030] A detailed illustration of the register file 24 is 
shown in Figure 8. The register file 24 includes a read 
decoder 60, a register file array 62, a write decoder 64, 

10 a register file control 66 and a register file operand bus 
driver 68. The read decoder 60 receives selected bits 
<8:3> of the A and B-operand pointers 36 and 37 for 
addressing the register file array 62 via four pairs of 
64-bit A and B operand address signals RAO, RA1 , RA2, 

IS RA3, RBO, RB1 . RB2 and RB3. The remainder of the A 
and B-operand pointer bits <2:0> are applied to the reg- 
ister file operand bus driver 68 to drive appropriate fields 
of operand data. 

[0031] The register file array 62 receives result data 

20 from the reorder buffer 26 via the four writeback busses 
34. When a reorder buffer entry is retired in parallel with 
up to three other reorder buffer entries, result data for 
the entry is placed on one of the writeback busses 34 
and the destination pointer for that entry is placed on a 

25 write pointer 33 that corresponds to the selected write- 
back bus. Data on writeback busses 34 are sent to des- 
ignated registers in the register file array 62 in accord- 
ance with address signals on write pointers busses 33 
which are applied to the write decoder 64. 

30 [0032] The register file control 66 receives override 
signals on A operand override lines 57 and B operand 
override lines 58 from the reorder buffer 26, which are 
then conveyed from the register file control 66 to the reg- 
ister file operand bus driver 68. The register file an^ay 

35 62 includes multiple addressable storage registers for 
storing results operated upon and generated by proces- 
sor functional units. Figure 9 shows an exemplary reg- 
ister file an^ay 62 with forty registers, including eight 
32-bit integer registers (EAX, EBX, ECX, EDX, ESP, 

40 EBP, ESI and EDI), eight 82-bit floating point registers 
FPO through FP7, sixteen 41 -bit temporary integer reg- 
isters ETMPO through ETMP15 and eight 82-bit tempo- 
rary floating point registers FTMPO through FTMP7 
which, in this embodiment, are mapped into the same 

^5 physical register locations as the temporary integer reg- 
isters ETMPO through ETMP15. 
[0033] Refemng to Figure 10, reorder buffer 26 in- 
cludes a reorder buffer (ROB) control and status block 
70, a ROB an-ay 74, and a ROB operand bus driver 76. 

50 ROB control and status block 70 Is connected to the A 
and B-operand pointers 36 and 37 and the destination 
pointer (DEST REG) busses 43 to receive inputs which 
identify source and destination operands for an ROP 
ROB array 74 is a memory array which is controlled by 

55 ROB control and status block 70. ROB an-ay 74 is con- 
nected to the result busses 32 to receive results from 
the functional units. Control signals, including a head, a 
tail, an A operand select, a B operand select and a result 
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select signal, are conveyed from ROB control andstatus 
70 to ROB array 74. These control signals select ROB 
array elennents that are input from result busses 32 data 
and output to writeback busses 34, write pointers 33, A 
and B-operand busses 30 and 31 , and A and B-operand 
tag busses 48 and 49. Sixteen destination pointers, one 
for each reorder buffer array element, are conveyed 
from ROB array 74 to ROB control and status 70 for per- 
fonning dependency checking. 
[0034] Figure 11 depicts an example of a reorder 
buffer an'ay 74 which includes sixteen entries, each of 
which includes a result field, a destination pointer field 
and other fields for storing control infomnation. A 41 -bit 
result field is furnished to store results received from the 
functional units. Two reorder buffer entries are used to 
store a floating point result. Integer results are stored in 
32 of the 41 bits and the remaining nine bits are used to 
hold status flags. The destination pointer field 
(DEST_PTR<8:0>) of each ROB an-ay 74 entry desig- 
nates a destination register in register file 24. 
[0035] The operation of the reorder buffer 26 and reg- 
ister file 24 in combination is described with reference 
to Figures 8, 9, 10 and 11. As the Instruction decoder 
18 dispatches ROPs, It provides source operand point- 
ers to the register file 24 and the reorder buffer 26 to 
select the contents of a register or a reorder buffer entry 
for application to the operand busses as a source oper- 
and using the four pairs of A and B-operand pointers 36 
and 37. In a similar manner, the instruction decoder 18 
provides a destination pointer to the reorder buffer 26 to 
identify a particular destination register of the thirty des- 
tination registers in the register file 24, using the four 
destination register (DEST_REG) pointer busses 43. 
The destination register is selected to receive the result 
of an executed ROP. 

[0036] When an ROP is dispatched, an entry of the 
reorder buffer 24 is allocated to It. The entry Includes a 
result field, to receive result data from the functional 
units. The result field of each entry is defined as a dou- 
bleword field, a word field, a high byte field, or a low byte 
field, and receives a corresponding field of the result da- 
ta when it becomes available upon execution of an ROP. 
For a doubleword operand field, the 1 6-bit high field, the 
8-bit middle field, and the 8-bit low field are all defined, 
as indicated by set bits 217, 216 and 215 respectively. 
For a word operand field, only the 8-bit middle field and 
the 8-bit low field are defined, as indicated by set bits 
21 6 and 215 respectively. For a low byte operand field, 
only the 8-bit low field is defined, as indicated by set bit 
215. For a high byte operand field, only the 8-blt middle 
field is defined, as indicated by set bit 21 6. The destina- 
tion pointer DEST_PTR, which contains the register ad- 
dress in DEST_PTR<8:3> and the defined field bits 21 7, 
216 and 215 in DEST_PTR<2:0>, is received by the re- 
order buffer control status 70 over destination register 
(DEST_REG) busses 43, and written into the destina- 
tion pointer field DEST_PTR<8:0> of the allocated entry 
of the reorder buffer an'ay 74. 



[0037] The pointer of the A or B-operand pointers 36 
and 37 addresses the ROB array 74, through the ROB 
control block 70, to designate the operand data to be 
applied to the ROB operand bus driver 76. ROB control 

5 and status 70 receives the operand pointers via the A 
and B-operand pointers 36 and 37. 
[0038] The reorder buffer 26 accomplishes depend- 
ency checking by simultaneously comparing each of the 
pointers of A and B-operand pointers 36 and 37 to the 

10 destination pointerfields of all sixteen elements of reor- 
der buffer array 74 to determine whether a match, which 
identifies a data dependency, exists. Up to eight oper- 
and pointers are simultaneously compared to the desti- 
nation pointers. For the high, operand field bits <8:3,2> 

'5 of the operand pointer are compared to bits <8:3,2> of 
the sixteen destination pointerfields. For the middle op- 
erand field, bits <8;3, 1 > of the operand pointer are com- 
pared with bits <8:3,1> of the sixteen destination pointer 
fields. For the low operand field, bits <8:3,0> of the op- 

20 erand pointer are compared with bits <8:3,0> of the six- 
teen destination pointerfields. A match for a particular 
field occurs when the operand pointer bits <8:3> match 
the destination pointer bits <8:3> and the field select bit 
Identifying the particular field Is asserted in both the op- 

25 erand pointer bits <2:0> and the destination pointer bits 
<2:0>. An operand pointer may match destination point- 
ers for multiple reorder buffer entries. When one or more 
matches occur, a pointer to the matching reorder buffer 
entry closest to the tall of the queue is used to identify 

30 the appropriate operand data. This pointer is called an 
operand tag. The reorder buffer 26 fumishes three op- 
erand tags for each operand, one for each field of the 
high, medium and low operand fields. The operand tags 
are applied as pointers to the reorder buffer entries to 

35 drive result data onto an operand bus. The high, medium 
and low field select bits of the operand pointer drive the 
reorder buffer result data respectively onto bits <31 :1 6>, 
[[GDZ:OR IS IT BITS <40:16>]1 <15:8> and <7:0> of the 
operand bus. 

40 [0039] The status and control field <23:0> of the re- 
order buffer array 74 includes a result valid bit which is 
asserted when the reorder buffer 26 receives a result 
from a functional unit. The result valid bit applies to all 
of the high, medium and low fields of the result. Simul- 

45 taneously for the three fields, the ROB control block 70 
addresses the ROB array 74 using the field-specific op- 
erand tags as pointers. The ROB array 74 fumishes ap- 
propriate bits of the result field of the entry to the ROB 
operand bus driver 76. A and B-operand pointer bits <2: 

50 o> are applied to the ROB operand bus driver 76 to de- 
fine the H, M and L bit fields to be driven onto the oper- 
and busses. Fields that are not defined are driven as 
zeros onto an operand bus. 

[0040] As the instruction decoder 18 drives opcodes 
55 onto the opcpde/type busses 50 and the reorder buffer 
26 drives operands onto the operand busses 30 and 31 , 
the reorder buffer 26 also drives the operand tags for 
each of the high, medium and low operand fields onto 
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the operand tag busses 48 and 49. The three operand 
tags each include an identifier of the reorder buffer entry 
from which a source operand is sought, regardless of 
whether the data is available. An operand tag valid bit 
is associated with each of the high, mediunn and lowfield s 
operand tags. An operand tag valid bit is asserted to in- 
dicate that operand data is not available. The operand 
tag valid bit is obtained for a reorder buffer entry by in- 
verting the result valid bit of the entry. Thus, there are 
three Independent tags in the A and B-operand tag io 
busses 48 and 49 associated with each operand bus 30 
and 31 that supply tagging infonnation for the low, nnid- 
dle and high data fields. Each of the high, medium and 
low operand tags is independent of the other tags so 
that, for the different fields, data may be driven onto op- is 
erand busses from different reorder buffer entries, the 
same entry or data may not be driven from an entry. 
[0041] In the event that a particular field of an operand 
is driven by the register file 24, the ROB 26 does not 
assert the corresponding operand tag valid bit, thereby 20 
indicating that operand data for the particular field is 
available. A suitable dependency checking circuit is de- 
scribed in detail in our European patent application EP 
0 679 991 A. 

[0042] The reorder buffer 26 sends override signals 25 
whenever it detects a dependency, whether the result is 
held in the result field of the reorder buffer entry or the 
result is unavailable awaiting execution of an ROP. In 
either case, the reorder bus control 70 sets override sig- 
nals for each dependent field of an operand to the reg- 30 
ister file 24 via an appropriate one of A override lines 57 
or B override lines 58. The reorder buffer control 70 
overrides the read operation of any dependent low, mid- 
dle or high fields of a register file array 62 entry by setting 
bits of the override busses 57 and 58 that are applied to 35 
the register file operand bus driver 68. The A override 
busses 57 and the B ovemde busses 58 include three 
forwarded-operand bits for each of the four A and B- 
operand busses 30 and 31 . The reorder buffer 26 con- 
trols overriding of any data fields and the register file 24 ^^o 
responds by disabling the register file operand bus driv- 
er 68 as instructed. Thus, an attempt to place data from 
the register file 24 onto an operand bus is overridden, 
but only for the operand fields for which a dependency 
exists, 45 
[0043] The read decoder 60 receives the A and B-op- 
erand pointers 36 and 37 and decodes operand pointer 
36 and 37 to select registers in the register file 24. The 
read decoder60 decodes the high six bits of the operand 
pointer 36 and 37 element to select a register. The value so 
from the accessed register is latched and driven onto 
one of the four pairs of 41 -bit A or B-operand transfer 
lines connecting the register file an^ay 62 to the register 
file operand bus driver 68. Bit positions that are not im- 
plemented in the integer registers of the register file ar- ss 
ray 62 are read as logical zeros on these busses. The 
register file operand bus driver 68 drives the latched val- 
ues selected in accordance with theH, M and L bit fields 



defined by bits <2:0> of operand pointers 36 and 37 onto 
A and B-operand busses 30 and 31. The register file 
control 66 receives the A and B override signals 57 and 
58 from the reorder buffer 26 to direct the override of a 
register file read operation in any of the low, middle or 
high fields of the entry. 

[0044] If the reorder buffer 26 detemiines that source 
operand data are not dependent on unavailable data 
and are therefore available either from the register file 
24 or the reorder buffer 26, the operand data are sent 
via operand busses 30 and 31 to the functional unit res- 
ervation stations. 

[0045] As functional units complete execution of op- 
erations and place results on the result busses 32, ROB 
control and status 70 receives pointers from the result 
tag busses 32 which designate the corresponding ROB 
array entries to receive data from the result busses 32. 
ROB control 70 directs the transfer of data from the re- 
sult busses 32 to the ROB an'ay 74 using four result se- 
lect pointers, 

[0046] ROB control and status 70 retires an ROP, 
communicating the result to the register file 24, by plac- 
ing the result field of an ROB array 74 element on one 
of the writeback busses 34 and driving the write pointer 
33 corresponding to the writeback bus with the destina- 
tion pointer. Write pointer 33 designates the register ad- 
dress within register file 24 to receive the retired result. 
[0047] Referring to Figure 1 2, without register renam- 
ing resource conflicts arise in which a subsequent in- 
struction must wait for the completed execution of a pre- 
vious instruction to resolve the conflict. This phenome- 
non is illustrated by the following sequence of x86 in- 
structions: 

mov ah,byte1 

mov al,byte2 

mov word12,axTh\s code sequence may be used 
in a loop to interleave two byte strings or to swap the 
byte order of 1 6-bit data. The first instruction loads bytel 
into register AH of register EAX. The second instruction 
loads byte2 into register AL of register EAX. The third 
instruction stores the AX register contents into a 1 6-bit 
memory location, named word12. 
[0048] In one implementation of register renaming, a 
modification to any part of the register EAX creates a 
new instance of the full register. This is appropriate for 
handling independent operations on the full register in 
parallel. However, to modify only a part of register EAX, 
such as in the first and second instructions above, and 
still be able to forward the full register contents to sub- 
sequent operations (the third instruction), the current 
contents of register EAX must be supplied to the func- 
tional unit for merger with the new field value to create 
the new 32-bit contents of register EAX. This generates 
a dependency of the second instnjction upon the first 
instruction, shown by arrow A of Figure 12, so that the 
instructions execute in a serial manner. The third in- 
struction is dependent upon the second, shown by arrow 
B, so that none of the three instructions can be executed 
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in parallel. 

[0049] Furthermore, the destination register becomes 
a required Input operand. Many x86 instructions have a 
two-operand fomn in which the destination is also one of 
the inputs. However, several instructions are defined in 
which this is not the case and the destination becomes 
a third input. Since the dependency handling logic must 
handle any of these cases, a greater burden is placed 
on the logic that is used only for these few exceptional 
instructions. 

[0050] In a preferred implementation of register re- 
naming, shown in Figure 13, partial fields of a register 
(EAX) are renamed so that the second instruction does 
not depend on the first. Only the third instruction is de- 
pendent upon a previous instruction so that the first two 
instructions may execute in parallel and execution of on- 
ly the third instruction is delayed due to a dependency 
condition. This preferred implementation of register re- 
naming reduces the total time for the sequence from 
three cycles to two. Additional acceleration of the proc- 
essor is accomplished since the data that results from 
the execution of the first two instructions is placed on 
the result busses and forwarded for execution of the 
third instruction. 

[0051] Figure 14 illustrates a generic functional unit 
22 that incorporates a generally standard set of compo- 
nent blocks and supports operand data having selecta- 
ble variable bit widths. Functional units may differ from 
the generic embodiment with respect to various details. 
For example, a functional unit may have several reser- 
vation stations and access more than one set of operand 
busses and result busses at one time. The generic func- 
tional unit 22 includes an A multiplexer 41, a B multi- 
plexer 42 and a tag-opcode multiplexer 45 for receiving 
input data and control signals. The generic functional 
unit 22 also includes a reservation station 44, an exe- 
cution unit 95 and a result bus driver 93. The reservation 
station 44 includes a FIFO 51, a tag-opcode FIFO 89, 
an A operand fonA/arding circuit 90 and a B operand for- 
warding circuit 92. 

[0052] The A multiplexer 41 is a 4:1 multiplexer that 
is connected to the four 41 -bit A-operand busses 30 and 
the four A-operand tag busses 48 to receive respective 
input operands and operand tags. Similarly, the B mul- 
tiplexer 42 is a 4:1 multiplexer that is connected to the 
four 41 -bit B-operand busses 31 and the four B-operand 
tag busses 49. The tag-opcode multiplexer blocl(45 in- 
cludes type comparison logic (not shown) and two mul- 
tiplexers, atag multiplexer (not shown) that is connected 
to the four destination tag busses 40 and an opcode mul- 
tiplexer (not shown) that is connected to the four opcode 
/ type busses 50. The tag multiplexer and the opcode 
multiplexer are 4:1 multiplexers. The bus select signal 
connects type comparison logic, the tag multiplexer and 
the opcode multiplexer internal to the tag-opcode mul- 
tiplexer block 45 and is connected to the A multiplexer 
41 and the B multiplexer 42. 

[0053] Within the reservation station 44, the tag-op- 



code FIFO 89 is connected to the tag -opcode multiplex- 
er 45 by destination tag lines and opcode lines that cor- 
respond respectively to a selected bus of the destination 
tag busses 40 and to a selected bus of the opcode / type 
5 busses 50. The FIFO 51 is connected to the A multiplex- 
er 41 by a first set of lines that correspond to a selected 
bus of the A-operand busses 30 and by a second set of 
lines that correspond to a selected bus of the A-operand 
tag busses 48. The first set of lines are connected inter- 
to nal to the FIFO 51 to an A data FIFO 52. Intemal to the 
FIFO 51 , the A data FIFO 52 has lines which connect to 
the A operand fonvarding circuit 90. The second set of 
lines are connected internal to the FIFO 51 to an A tag 
FIFO 53. The FIFO 51 is connected to the B multiplexer 
15 42 by a third set of lines that correspond to a selected 
bus of the B-operand busses 31 and by a fourth set of 
lines that correspond to a selected bus of the B-operand 
tag busses 49. The third set of lines are connected in- 
ternal to the FIFO 51 to a B data FIFO 55. Internal to the 
20 FIFO 51 , the B data FIFO 55 has lines which connect to 
the B operand fonwarding circuit 92. The fourth set of 
lines are connected internal to the FIFO 51 to an B tag 
FIFO 56. The A operand forwarding circuit 90 is con- 
nected to the A tag FIFO 53, the five result tag busses 
25 39 and the five result busses 32. Similarly, the B operand 
fonvarding circuit 92 is connected to the B tag FIFO 56, 
the five result tag busses 39 and the five result busses 
32. 

[0054] The execution unit 95 is connected to the res- 
30 ervation station 44 using A operand lines from the A data 
FIFO 52, B operand lines from the B data FIFO 55, and 
destination tag lines and opcode lines from the tag-op- 
code FIFO block 89, The execution unit 95 is also con- 
nected to a result grant signal input from a result bus 
35 arbitrator (not shown). The result bus driver 93 is con- 
nected at its outputs to a result request signal line which 
connects to the result bus arbitrator, the result tag 
busses 39 and the result busses 32. 
[0055] The functional unit is activated when a type 
40 code match occurs between a dispatched ROP and a 
functional unit. A type code match takes place when a 
type code on one of the four type code busses corre- 
sponds to the type code assigned to the functional unit. 
When a type code matches, the tag-opcode multiplexer 
^5 45 generates a bus select signal that specifies the par- 
ticular bus of the operand, operand tag, destination tag 
and opcode / type busses to be selected. The bus select 
signal is applied to the tag multiplexer and the opcode 
multiplexer of the tag-opcode multiplexer 45, the A mul- 
so tiplexer 41 and the B multiplexer 42, directing operand 
and tag infomnation into the reservation station 44. The 
selected destination tag and the selected opcode are 
written into the tag-opcode FIFO 89. The tag FIFO and 
the opcode FIFO of the tag-opcode FIFO 89 are tempe- 
rs rary memories for holding the destination tag as well as 
the local opcode. The tag identifies the entry within the 
reorder buffer 26 into which the ROP and its result are 
eventually written after the instruction is executed and 
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its result is placed on the result busses 32. Thus, the A 
and B operands, A and B operand tags, the destination 
tag and the opcode are held in the reservation station 
44 and pushed through the FIFO 51 and tag-opcode 
FIFO 89 for each reservation station entry. 
[0056] The selected A operand data and A operand 
tag are respectively written Into the A data FIFO 52 and 
the A tag FIFO 53. The selected B operand data and B 
operand tag are respectively written into the B data FIFO 
55 and the B tag FIFO 56. Each of the A tag 53 and B 
tag 56 entries in the FIFO 51 includes three operand 
tags coaesponding to the high, medium and low fields 
of an operand. As shown generally in Figure 15, each 
of the high, medium and low operand fields 100, 101 
and 102 have an associated operand tag valid bit 106, 
107 and 108. Operand tag valid bits from the operand 
tag bus are directed through a multiplexer to a tag FIFO 
in the FIFO 51 . A tag FIFO entry Includes the high, me- 
dium and low operand tags 1 03, 1 04 and 1 05, which are 
written Into the tag FIFO together. The tag FIFO also 
Includes the operand tag valid bits 106, 107 and 108 
which Indicate for each field, when set, that operand da- 
ta is not available. Thus, if a field is defined and data Is 
not available in the register file 24 or reorder buffer 26, 
the reorder buffer 26 drives onto the operand tag bus an 
asserted operand tag valid bit, accompanied by the op- 
erand tag which identifies the reorder buffer entry into 
which the unavailable data is written when It becomes 
available. If a field Is undefined with respect to the ROP 
or data is available, the operand tag valid bit for that field 
is nonasserted. 

[0057] The purpose of a reservation station is to allow 
an instruction decoder to dispatch speculative ROPs to 
functional units regardless of whether source operands 
are currently available. This allows a number of specu- 
lative ROPs to be dispatched without waiting for a cal- 
culation or a load / store to complete. The A data 52, A 
tag 53, B data 55, B tag 56, and destination tag and op- 
code FIFOs of the tag-opcode FIFO block 89 are two- 
deep FIFOs so that the reservation station 44 can hold 
two source operands and tags plus the information on 
the destination and opcode in each of the entries. 
[0058] The reservation station 44 also forwards 
source operands that were unavailable at dispatch di- 
rectly from the result busses 32 using the operand tags 
and the operand tag valid bits stored therein. When all 
appropriate A and B operand data fields are present in 
FIFO 51, the functional unit arbitrates for a bus of the 
result busses 32 using a result request signal from the 
result bus driver 93. 

[0059] When a result bus is available and the result 
grant signal is asserted, the execution unit 95 executes 
the ROP and conveys result data to the result bus driver 
93. Depending on the type of functional unit, the execu- 
tion unit 95 executes one or more operation of a variety 
of operations that are standard In processors, including 
Integer or floating point arithmetic, shifting, data load/ 
store operations, data comparison and branching oper- 



ations, and logic operations, for example. 
[0060] The execution unit 95 also arranges the data 
for output to the result busses 32. For single-byte oper- 
and opcodes, a one-bit AHBYTE (high byte) signal de- 

5 termines whether an 8-bit register operand is a high byte 
or a low byte, residing in the middle M or low L register 
fields, respectively, as is shown by the data structure 
200 of Figure 2. If the AHBYTE signal is set, execution 
unit 95 locally remaps data in the middle field (bits<15: 

10 6>) into the low bit field position (bits <7:0>) before ex- 
ecuting an ROP. The remapping operation includes the 
operations of sign extending or zero extending the rem- 
apped high bytes, In accordance with the specified op- 
eration. High bytes are always read from the middle field 

IS from the register file 24, the reorder buffer 26, the oper- 
and busses 30 and 31 and the result busses 32. The 
high byte is remapped locally by functional units that 
perfonn right-justified operations. 
[0061] The result bus driver 93 drives the result data 

20 onto the available 41 -bit result bus and the correspond- 
ing entries in the reservation station data, operand tag, 
destination tag and opcode FIFOs are . cleared. The re- 
sult bus driver 93 drives the destination tag from the des- 
tination tag FIFO onto the result tag bus that is associ- 

25 ated with the available result bus. In addition to the des- 
tination tag, the result bus driver 93 sets status indica- 
tion signals on the result bus including nonmal, valid and 
exception flags, for example. 

[0062] The A operand fonvarding circuit 90 and the B 
30 operand forwarding circuit 92 monitor the result tag 
busses 39 to detect a result that satisfies a data depend- 
ency that is delaying execution of an ROP within the 
FIFO 51, A result tag identifies the reorder buffer entry 
Into which the result Is written. The A operand forward- 
's ing circuit 90 monitors the result tag busses 39 and com- 
pares the result tags carried thereon to a tags from the 
Atag FIFO 53. In this monitoring operation, the forward- 
ing circuit compares each of the high, medium and low- 
order operand tags to the result tag ROB entry identifier 
40 Although ail three fields are tested simultaneously, each 
of the three fields is tested independently of the other 
fields. 

[0063] For each field, when the field-specific operand 
tag matches the result tag, a data dependency is re- 

45 solved for that field. The result data for the field is for- 
warded into the con-esponding data FIFO entry and writ- 
ten into the bits of the field. The operand tag valid bit for 
the con-esponding tag FIFO entry and field is cleared to 
indicate that the data dependency relating to that field 

50 is resolved. 

[0064] When data dependencies are resolved for all 
fields of all source operands of an ROP, and the func- 
tional unit Is not busy and a result bus is available, the 
ROP is executed. In a similar manner, the B operand 

55 forwarding circuit 92 monitors the result tag busses 39 
and compares the result tags carried thereon to a tag 
from the B tag FIFO 56. The result bus driver 93 always 
drives a result onto a result bus with each field appro- 
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priately positioned so that data is consistently presented 
in the correct position when data is forwarded to the res- 
ervation station 44. 

[0065] Referring to Figure 16, load/store functional 
unit 80 executes LOAD and STORE instmctions and in- 
teracts with the data cache 86. Load/store functional unit 
80 includes a dual-ported reservation station 85, a four- 
entry store buffer 84 and a load/store result bus driver 
87. Each port is connected to the store buffer 84 and the 
data cache 86 by a channel, which includes 40 data bits 
and a suitable nunnber of address bits. The reservation 
station 85 includes a nn ultip lexer 81 , a load store con- 
troller 83, a merge circuit 91 and a FIFO 82 for queuing 
up to four ROPs. 

[0066] The multiplexer 81 Includes 4:1 multiplexers 
that are connected to the A and B-operand and tag 
busses 30, 31, 48 and 49. Each FIFO entry in the res- 
ervation station 85 holds all of the infonmation fields that 
are necessary to execute a load or store operation. In 
one processor clock cycle, up to two ROPs are issued 
and up to two FIFO entries are retired. The load/store 
reservation station 85 is connected, at its Inputs, to the 
four A and B operand busses 30 and 31 , the four A and 
B operand tag busses 48 and 49, the five result busses 
32, the four destination tag busses 40 and the four op- 
code/type busses 50. The reservation station 85 is also 
connected to the data portions of ports A and B of data 
cache 86. Reservation station 85 is connected to store 
buffer 84 using A and B port reservation station data 
busses RSDATA A and RSDATA B, respectively, and A 
and B port reservation station address busses RSAODR 
A and RSADDR B, respectively, which are also connect- 
ed to the address lines of ports A and B of the data cache 
86. Reservation station 85 is connected to controller 83 
using a reservation station load bus RSLOAD and a res- 
ervation station shift bus RSHIFT The store buffer 84 is 
connected to the load/store result bus driver 87, the ad- 
dress/data bus 1 1 , and the load store controller 83 using 
a store buffer load bus SBLOAD and a store buffer shift 
bus SBSHIFT. In addition to connections with reserva- 
tion station 85 and store buffer 84, load store controller 
83 is connected to data cache 86 and reorder buffer 26. 
In addition to connections to store buffer 84, the toad/ 
store result bus driver connects to the data cache 86 
and to the five result busses 32 and the five result tag 
busses 39. 

[0067] Data cache 86 is a linearly addressed 4-way 
interieaved, 8 Kbyte 4-way set associative cache that 
supports two operations per clock cycle. Data cache 86 
is arranged as 128 sixteen byte entries. Each 16 byte 
entry Is stored in a line of four individually addressed 
32-bit banks. Individually addressable banks pemriit the 
data cache 86 to be accessed concurrently by two 
ROPs, such as two simultaneous load operations, while 
avoiding the overhead identified with dual porting. 
[0068] A load operation reads data from the data 
cache 86. During a load operation, reservation station 
85 supplies an address to data cache 86. If the address 



generates a cache hit, data cache 86 furnishes the data 
which is stored in a corresponding bank and block of a 
store anray (not shown) of the data cache 86 to reser- 
vation station 85. A doubleword is transferred from the 
5 data cache 86 to the load/store result bus driver 87. The 
upper two bits of the load/store instruction opcode spec- 
ify the size of the result to be produced. The types of 
results are doublewords, words, high bytes or low bytes. 
Unused bits are set to zero. For high bytes, the result 
10 produced by executing the ROP Is remapped Into the 
middle bit field before the result is driven onto the result 
busses 32 by the load/store result bus driver 87. High 
bytes are always read from the middle bit field of the 
operand. Load/store result bus driver 87 masks unused 
'5 portions of data that are read by the doubleword read 
operation. If the AHBYTE signal is set, the load/store 
resultbus driver 87 remaps the low field data bits <7:0> 
into the middle field bits <1 5:8>. The bus driver 87 then 
drives the result on one of the result busses 32. If the 

20 address was supplied to data cache 86 over port A, then 
the data is provided to reservation station circuit 85 via 
port A. Otherwise, if the address was presented to data 
cache 86 using-port B, then the data is communicated 
to reservation station 85 using port B. Addresses are 

25 communicated to data cache 86 and data is received 
from data cache 86 using ports A and B simultaneously. 
As the load/store result bus driver 87 drives the result 
onto one of the result busses 32, it also drives the cor- 
responding one of the result tag busses 39. 

30 [0069] A store operation is a doubleword read opera- 
tion from data cache 86, followed by a doubleword write 
back to the cache 86. During a store operation, an ad- 
dressed doubleword is first transfen-ed from data cache 
86 to store buffer 84. Then the data is communicated 

35 from reservation station 85 to store buffer 84. If the store 
data is 32 bits or more in width, the data replaces the 
doubleword that was read from data cache 86. If the 
store data is less than 32 bits in width, the merge circuit 
91 merges the applicable data fields into the doubleword 

40 that was read from data cache 86. If a portion of the 
store data is not available, then an operand tag is used 
to replace the unavailable data. The mix of data and tags 
is held in the store buffer until att bit fields of missing 
data are forwarded from the result busses. By holding 

^ partial data in the store buffer 84 until all fields are avail- 
able, only full doublewords are written to cache 86. Writ- 
ing of individual 8-blt bytes is not necessary. The 
merged data is then communicated back to the data 
cache 86 by the load / store result bus driver 87. Load 

50 and store operations of store data that are greater than 
32 bits in width execute multiple accesses to the data 
cache 86 and construct the data in store buffer 84 before 
writing it back to the data cache 86. When the store op- 
eration is released, the data and corresponding address 

55 are communicated using address/data bus 11 to data 
cache 86. This embodiment is described in European 
patent application EP 0 679 991 A. 
[0070] While the invention has been described with 
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reference to particular embodiments, It will be under- 
stood that the embodiments are illustrative and that the 
invention scope is not so limited. Many variations, mod- 
ifications, additions and improvements to the embodi- 
ment described are possible. For example, the invention 
may be implemented on a processor other than an x86 
architecture processor or a CISC architecture proces- 
sor. Also, the bit width of the data path may be different 
from 32 bits or 41 bits. The data path may be partitioned 
into more or fewer fields than three. The number of bits 
in the various structures and busses is illustrative, and 
may be varied. The size of the register file and the reor- 
der buffer, the number of operand buses and operand 
tag busses, the number of result buses, the number of 
writeback buses, and the type and number of functional 
units are illustrative, and may be varied. The invention 
maybe practiced in a processor that is not a superscalar 
processor or a pipelined processor, although the advan- 
tages of the invention are greater In a superscalar im- 
plementation. These and other variations, modifica- 
tions, additions and improvements may fall within the 
scope of the invention as defined in the following claims. 



Claims 

1. A method for executing operations utilizing varia- 
ble-sized operands in a processor (10), the method 
comprising the steps of: 

partitioning full-sized operands bit-wise into a 
plurality of operand fields (215, 216, 217); 
designating each partitioned field as defined or 
undefined with respect to an operation; 
for each operation, detemnlning for each oper- 
and field independent of the other fields wheth- 
er data of each operand field that is utilized by 
the operation is dependent on an unavailable 
result of a nonexecuted operation; 
executing the operation utilizing the operand 
field data if data in all utilized fields are not de- 
pendent on an unavailable result, and other- 
wise waiting for dependent data in the fields to 
become available and then executing the oper- 
ation; 

fonvarding result data for utilization by the op- 
eration when the result data becomes available 
for each of the dependant fields independently 
of the other partitioned fields. 

2. A method as claimed in claim 1 , wherein the de- 
pendence determining step further comprises the 
steps of: 

defining for an operation a destination operand 
and destination operand fields and a source op- 
erand and source operand fields; 



storing IdentifierB of the destination operand 
and the destination operand fields for several 
operations; 

5 comparing the source operand and source op- 

erand field identifiers to the stored destination 
operands and destination operand fields; and 

detemriining a data dependency when source 
10 operand and source operand field identifiers 

match stored destination operand and destina- 
tion operand field identifiers. 

3. A method as claimed in claim 1 wherein the step of 
IS detemnining whether data is dependent further 

comprises: 

4. A method as in Claim 3, further comprising the steps 
of: 

20 

executing an operation to produce result data; 
and 

storing the result data in one of a plurality of 
memory elements (62). 

25 

5. A method as in Claim 4, further comprising the step 
of utilizing data stored in a memory element for each 
non-dependent field independently of the other 
fields. 

30 

6. A method as in Claim 5, wherein operands are 
source operands that are operated upon by an op- 
eration and destination operands that are generat- 
ed by an operation, further comprising the steps of: 

35 

tagging each defined field of a first operation's 
destination operand independently of the other 
fields with a destination tag that identifies a 
memory element which receives the opera- 
te tion's result data; and 

tagging each defined field of a second opera- 
tion's source operand independently of the oth- 
erf ields with a source tag that identifies a mem- 
ory element which supplies the operation's op- 
^5 erand data, 

wherein the dependency detecting step In- 
cludes the steps of: 

so comparing the destination tag to the source tag , 

and 

activating the fonvarding step when the tags 
mutually correspond for each defined field In- 
dependently of the other fields. 

55 

7. A data handling apparatus for a processor (10) 
which executes operations utilizing operands of a 
variable size, the apparatus comprising: 
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means (18) for partitioning an operand utilized 
by an operation into a plurality of fields (215, 
216,217); 

means (36, 37) responsive to the partitioning 
means for designating each partitioned field as 
defined or undefined with respect to the oper- 
ation; 

means (26) responsive to the designating 
means for detecting data dependencies of each 
of the defined fields independently of the other 
partitioned fields; 

means (44) responsive to the dependency de- 
tecting means for forwarding result data for uti- 
lization by the operation when the result data 
becomes available for each of the dependent 
fields independently of the other partitioned 
fields; and 

a functional unit (20, 21 , 22, 80) responsive to 
the fonvarding of result data to execute the op- 
eration and generate a result. 

8. An apparatus as in Claim 7, further comprising: 

a memory (62) coupled to the functional unit 
including memory elements to store result data. 

9. An apparatus as in Claim 8, further comprising: 

means for utilizing data stored in the memory 
elements for each non-dependent field independ- 
ently of the other fields. 

10. An apparatus as in claim 9, wherein operands in- 
clude source operands that are operated upon by 
an operation and destination operands that are gen- 
erated by an operation, the apparatus further com- 
prising: 

means (70) for tagging each defined field of a 
first operation's destination operand independ- 
ently of the other fields with a destination tag 
which identifies a memory element that re- 
ceives the operations result data; and 
means (76) for tagging each defined field of a 
second operation's source operand independ- 
ently of the other fields with a source tag which 
identifies a memory element that supplies the 
operations operand data, 

wherein the dependency detecting means In- 
cludes: 

a comparator (41 , 42) which compares the des- 
tination tag to the source tag and activates the 
forwarding means when the tags mutually cor- 
respond for each defined field independently of 
the other fields. 

1 1 . A processor comprising data handling apparatus as 
claimed in claim 7, and: 



an Instruction decoder (1 8) including the means 
for partitioning an operand and the means for 
designating each field of the operand; 
a reorder buffer (26) coupled to the instruction 
5 decoder including a memory (74) storing oper- 

and data and the means for detecting data de- 
pendencies; 

a bus (30, 31 , 32) coupled to the reorder buffer 
to communicate operand data for each defined 
10 field independently of the other fields; and 

the functional unit being coupled to the bus. 

12. A processor as in Claim 11, wherein the bus in- 
cludes: 

15 

an operand (30, 31 ) bus coupled from the reor- 
der buffer output to the functional unit input to 
communicate operand data; and 
a result bus (32) coupled from the functional 
20 unit output to the inputs of the reorder buffer 

and the functional unit to communicate result 
data. 

13. A processor as in Claim 12, wherein the reorder 
25 buffer further comprises: 

means (70) for assigning a destination tag iden- 
tifier which designates a memory element to re- 
ceive operation execution result data for each 
30 operand field Independently of the other fields; 

means (76) for assigning an operand tag iden- 
tifier which designates a source operand of a 
data dependent operation for each operand 
field independently of the other fields, the proo- 
fs essor further comprising: 

a destination tag bus (40) coupled from the 
reorder buffer output to the functional unit 
input to convey the destination tag identlfi- 

^0 er of an operation antecedent to its execu- 

tion by a functional unit; 
a result tag bus (39) coupled from the func- 
tional unit output to the inputs of the func- 
tional unit and the reorder buffer to convey 

^5 the destination tag Identif ierof an operation 

subsequent to its execution by a functional 
unit; 

an operand tag bus (48, 49) coupled from 
the reorder buffer output to the functional 

50 unit input to convey the operand tag iden- 

tifier of a data dependent operation; and 
a reservation station (44) associated with 
the functional unit and coupled to the oper- 
and tag bus, the result tag bus and the re- 

55 suit bus, the reservation station including: 

a comparator (81) which compares an 
identifier from the operand tag bus to 
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bitweises Aufteilen von Operanden voller Gro- 
f3e in mehrere Operandenfelder (21 5,21 6,21 7); 

Kennzetchnen jedes aufgeteitten Felds als fur 
s eine Operation definiert Oder undefiniert; 

fur jede Operation Feststellen fur jedes Ope- 
randenfetd unabhangig von anderen Feldern, 
ob Daten von jedem Operandenfeld, das bei 
10 der Operation benutzt wird, von einem nicht 

verfugbaren Ergebnis einer nicht ausgefiihrten 
Operation abhangig sind; 

Ausfuhren der Operation mittels der Operan- 
ds denfelddaten, wenn Daten in samtilchen be- 
nutzten Feldem nicht von einem nicht verfug- 
baren Ergebnis abhangig sind, andernfalls 
Warten darauf, dass abhangige Daten in den 
Feldern verfugbar warden, danach Ausfuhren 
^ der Operation; 

Weiterlerten von Ergebnisdaten zur Verwen- 
dung bei der Operation, wenn die Ergebnisda- 
ten fur jedes der abhangigen Felder verfugbar 
25 warden, und zwar unabhangig von den ande- 



an identifier from the result tag bus for 
each defined fleid independently of the 
other fields; and 

a foHArarding circuit (90, 92) coupled to 
the result bus and the comparator and 
responsive to mutually con^esponding 
operand tag and result tag identifiers 
to forward result data for each defined 
field independently of the other fields. 

14. A processor as in Claim 13, further comprising: 

a register file (24) coupled to the reorder buff- 
er and the operand bus and having a memory (62) 
wherein: 

the reorder buffer memory stores speculative 
operation results, and 

the register file stores operation results from 
the reorder buffer memory when the operations 
become nonspeculative. 

15. A processor as in Claim 14, further comprising: 

a reorder buffer bus driver (76) including means 
for driving data from the reorder buffer memory 
onto an operand bus for each field having spec- 
ulative data available independently of the oth- 
er fields; and 

a register file bus driver (68) including means 
for driving data from the register file memory 
onto an operand bus for the remaining fields 
other than those driven by the reorder bus driv- 
er. 

16. A processor as in Claim 15, further comprising: 

a result bus driver (93) coupled to the func- 
tional unit and responsive to the execution of an op- 
eration to drive result data from the functional unit 
onto a result bus. 

17. A processor as in Claim 16, wherein the functional 
unit further comprises means for setting a result 
field to 0 for each undefined field. 

1 8. A processor as in any one of Claims 7 to 1 7, wherein 
the operand has a bit width of 32 bits and is parti- 
tioned by the partitioning means into a three fields 
including a high order 16 bit field, a middle order 8 
bit field and a low order 8 bit field. 



Patentansprtiche 

1 . Verfahren zum Ausfuhren von Operationen mittels 
Operanden mit variabler GroBe in einem Prozessor 
(10), wobei das Verfahren folgende Schritte um- 
fasst: 



ren aufgeteilten Feldem. 

2. Verfahren nach Anspruch 1 , bei dem der Schritt zur 
Feststellung der Abhangigkeit ferner folgende 

30 Schritte umfasst: 

Definieren von Zieloperand und Zleloperan- 
denfeldern und Quellenoperand und Quellen- 
operandenfeldern fur eine Operation; 

35 

Speichern von Kennzeichen des Zieloperan- 
den und der Zieloperandenfelder fur mehrere 
Operationen; 

^0 Vergleichen der Kennzeichen des Quellenope- 

randen und der Quellenoperandenfelder mit 
den gespeicherten Zieloperanden undZielope- 
randenfeldem; und 

^ . Bestlmmen einer Datenabhangigkeit, wenn die 

Kennzeichen des Quellenoperands und der 
Quellenoperandenfelder mit den gespeicher- 
ten Kennzeichen der Zieloperanden und Ziel- 
operandenfelder ubereinstimmen. 

50 

3. Verfahren nach Anspruch 1 , bei dem der Schritt des 
Bestimmens der Datenabhangigkeit ferner folgen- 
den Schritt umfasst: 

Detektieren von Datenabhangigkeiten jedes der 
55 definierten Operandenfelder unabhangig von den 
anderen aufgeteilten Operandenfeldern. 

4. Verfahren nach Anspruch 3, ferner mit folgenden 



35 



50 

3. 
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Schritten: 

Ausfuhren einer Operation zur Erzeugung von 
Ergebnisdaten; und 

5 

Speichern der Ergebnisdaten in elnem von 
mehreren Speicherelementen (62). 

5. Verfahren nach Anspruch 4, ferner mrt dem Schritt 
des Verwendens von in einem Speicherelennent ge- io 
speicherten Daten fur jedes nicht abhangige Feld, 
und zwar unabhangig von den anderen Feidern. 

6. Verfahren nach Anspruch 5, be! dem Operanden 
Quellenoperanden sind, mit denen bei etner Ope- is 
ration operiert wird, und Zieloperanden sind, die bei 
einer Operation erzeugtwerden, wobei das Verfah- 
ren ferner folgende Schritte umfasst: 

Markieren jedes definierten Felds des Zielope- 20 
randen einer ersten Operation unabhangig von 
den anderen Feldem mittels einer Zielmarkie- 
rung, die ein Speicherelement identifiziert, das 
die Ergebnisdaten der Operation empfangt; 
und 25 

Markieren jedes definierten Felds des Quellen- 
operanden einer zweiten Operation unabhan- 
gig von den anderen Fetdern mittels einer Ouel- 
lenmarkierung, die ein Speicherelement identi- 30 
fiziert, das die Operandendaten der Operation 
liefert, 

wobei der Schritt der Abhangigkeitsdetektierung 
folgende Schritte umfasst: 35 

Verglelchen der Zielmarkierung mit der Quet- 
lenmarkierung, und 

Aktivieren des Weiterleitungsschritts, wenn die ^0 
Markierungen einander fur jedes definierte 
Feld unabhangig von den anderen Feidern ent- 
sprechen. 

7. Datenverarbeitungsvorrichtung fur einen Prozes- "^s 
sor (10), der Operationen mittels Operanden varia- 
bler GroBe ausfuhrt, wobei die Von'ichtung auf- 
weist: 

eine Einrichtung (18) zum Aufteilen eines bei so 
einer Operation benutzten Operanden in meh- 
rere Felder (215,216,217); 

eine Einrichtung (36,37), die zum Kennzeich- 
nen jedes aufgeteiiten Felds als hinsichtlich der ss 
Operation definiert Oder undefiniert auf die Auf- 
teilungseinrichtung anspricht; 



eine Einrichtung (26), die zum Detektieren von 
Datenabhangigkerten jedes der definierten Fel- 
der unabhangig von den anderen aufgeteiiten 
Feidern auf die Kennzeichnungseinrichtung 
anspricht; 

eine Einrichtung (44), die zum Weiterleiten von 
Ergebnisdaten zur Verwendung bei der Opera- 
tion, wenn die Ergebnisdaten fur jedes der ab- 
hangigen Felder unabhangig von den anderen 
aufgeteiiten Feldem verfiigbar werden, auf die 
Abhangigkeitsdetektiereinrichtung anspricht; 
und 

eine Funktionseinheit (20,21,22,80), die zur 
Ausfuhrung der Operation und Erzeugung ei- 
nes Ergebnisses auf das Weiterieiten von Er- 
gebnisdaten anspricht. 

8. Vorrichtung nach Anspruch 7, femer mit: 

einem mit der Funktionseinheit, die Speicherele- 
mente zum Speichern von Ergebnisdaten aufwelst, 
gekoppelten Speicher (62). 

9. Vonichtung nach Anspruch 8, femer mit: 

einer Einrichtung zum Verwenden von Daten, die 
fur jedes nicht abhangige Feld unabhangig von den 
anderen Feldem in den Speicherelementen gespei- 
chert sind. 

10. Vorrichtung nach Anspruch 9, bei der die Operan- 
den Quellenoperanden umfassen, mit denen bei ei- 
ner Operation operiert wird, und Zieloperanden um- 
fassen, die bei einer Operation erzeugtwerden, wo- 
bei die Vorrichtung femer aufwelst: 

eine Einrichtung (70) zum Markieren jedes de- 
finierten Felds des Zieloperanden einer ersten 
Operation unabhangig von den anderen Fel- 
dem mittels einer Zielmarkierung, die ein Spei- 
cherelement Identiflziert. das die Ergebnisda- 
ten der Operation empfSngt; und 

eine Einrichtung (76) zum Markieren jedes de- 
finierten Felds des Quellenoperanden einer 
zweiten Operation unabhdngig von den ande- 
ren Feldem mittels einer Quellenmartderung, 
die ein Speicherelement markiert, das die Ope- 
randendaten der Operation liefert, 

wobei die Abh§ngigkeitsdetektiereinrichtung auf- 
welst: 

einen Komparator (41,42), der die Zielmarkierung 
mit der Queilenmarkierung vergleicht und die Wei- 
terleitungseinrichtung aktiviert, wenn die Markie* 
rungen einander fur jedes definierte Feld unabhan- 
gig von den anderen Feidern entsprechen. 
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1 1 . Prozessor mit einer Datenverarbeitungsvorrichtung 
nach Anspruch 7 und: 

einem Befehlsdekodierer (18), der die Einrich- 
tung zum Aufteilen eines Operanden und die 5 
Einrichtung zum Kennzeichnen jedes Felds 
des Operanden aufweist; 

einem mit dem Befehlsdekodierer gekoppeften 
Neuordnungspuffer (26), wobei der Befehlsde- io 
kodierer einen Speicher (74), in dem Operan- 
dendaten gespeichertsind, und die Einrichtung 
zum Detektieren von Datenabh§ngigkeiten 
aufweist; 

15 

einem zum Ubertragen von Operandendaten 
fur jedes definierte Feld unabhangig von den 
anderen Feldern mit dem Neuordnungspuffer 
gekoppelten Bus (30,31 ,32); 

20 

wobei die Funktionseinheil mit dem Bus gekoppelt 
ist. 

12. Prozessor nach Anspruch 11 , bei dem der Bus auf- 
weist: 25 

einen Operandenbus (30,31 ), derzum Ubertra- 
gen von Operandendaten zwischen dem Aus- 
gang des Neuordnungspuffers und dem Ein- 
gang der Funktionselnheit gekoppelt ist; und 30 

einen Ergebnisbus (32), der zum Ubertragen 
von Ergebnisdaten zwischen dem Ausgang der 
Funktionselnheit und den Etngange des Neu- 
ordnungspuffers und der Funktionselnheit ge- 35 
koppelt ist. 

13. Prozessor nach Anspruch 1 2, bei dem der Neuord- 
nungspuffer ferner aufweist: 

40 

eine Einrichtung (70) zum Zuweisen eines Ziel- 
markierungskennzetchens, das ein Speicher- 
element zum Empfangen von Ergebnisdaten 
einer Operationsausfuhrung fur jedes Operan- 
denfeld unabhangig von den anderen Feldern 
kennzeichnet; 

eine Einrichtung (76) zum Zuweisen eines 
Operandenmarkierungskennzeichens, das ei- 
nen Quellenoperanden einer datenabhangigen so 
Operation fur jedes Operandenfeld unabhan- 
gig von den anderen Feldem kennzeichnet, 
wobei der Prozessor ferner aufweist: 

einen zum Ubennltteln des Zielmarkie- ss 
rungskennzeichens einer Operation vor 
deren Ausfuhrung durch eine Funktions- 
elnheit zwischen dem Ausgang des Neu- 



ordnungspuffers und dem Eingang der 
Funktionselnheit gekoppelten Zieikennzei- 
chenbus (40); 

einen zum Ubennittein des Zielmarkie- 
rungskennzeichens einer Operation nach 
deren Ausfuhrung durch eine Funktions- 
einheit zwischen dem Ausgang der Funk- 
tionselnheit und den Eingangen der Funk- 
tionselnheit sowie dem Neuordnungspuf- 
fer gekoppelten Ergebnismarklenjngsbus 
(39); 

einen zum Ubemrirtteln des Operanden- 
markierungskennzelchens einer datenab- 
hangigen Operation zwischen dem Aus- 
gang des Neuordnungspuffers und dem 
Eingang der Funktionselnheit gekoppelten 
Operandenmarkierungsbus (48,49); und 

eine der Funktionseinheitzugeordnete und 
mit dem Operandenmarkierungsbus, dem 
Ergebnismarklenjngsbus und dem Ergeb- 
nisbus gekoppelte Reservierungsstation 
(44), diefolgendes aufweist: 

einen Komparator (81) zum Vergtel- 
chen eines Kennzelchens vom Ope- 
randenmarkierungsbus mit einem 
Kennzeichen vom Ergebnismarkie- 
rungsbus fur jedes definierte Feld un- 
abhangig von den anderen Feldern; 
und 

eine Werterleitungsschaltung (90,92), 
die mit dem Ergebnisbus und dem 
Komparator gekoppelt Ist und zum 
Weiterleiten von Ergebnisdaten fiir je- 
des definierte Fetd unabhangig von 
den anderen Feldern auf einander ent- 
sprechende Operandenmarklerungs- 
und Ergebnlsmarkierungskennzei- 
chen anspricht. 

14. Prozessor nach Anspruch 13, femer mit: 

einer mit dem Neuordnungspuffer und dem Ope- 
randenbus gekoppelten Registerdatei (24), die ei- 
nen Speicher (62) aufweist, wobei: 

In dem Speicher des Neuordnungspuffers spe- 
kulative Operationsergebnisse gespeichert 
werden, und 

in der Registerdatei Operationsergebnisse aus 
dem Speicher des Neuordnungspuffers ge- 
speichert werden, wenn die Operationen 
nichtspekulativ werden. 
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15. Prozessor nach Anspruch 14, ferner mit: 

einem Neuordnungspuffer*Bustreiber (76) mit 
einer Einrichtung zum Legen von Daten vom 
Speicher des Neuordnungspuffers auf einen 5 
Operandenbus fur jedes Feld, auf dem speku- 
tative Daten verfugbar stnd, und zwar unabhan- 
gig von den anderen Feldern, und 

einem Registerdatei-Bustreiber (68) mit einer io 
Einrichtung zum Legen von Daten vom Spel- 2, 
Cher der Registerdatei auf einen Operanden- 
bus fur die ubrigen Felder, die nicht vom Neu- 
ordnungspuffer-Bustreiber angesteuert wer- 
den. 15 

16. Prozessor nach Anspruch 16, ferner mit: 

einem Ergebnisbustreiber (93), der mit der Funkti- 
onseinhert gekoppelt ist und zum Legen von Ergeb- 
nisdaten von der Funktionselnheit auf den Ergeb- 20 
nisbus auf die Ausfuhrung einer Operation an- 
spricht. 

17. Prozessor nach Anspruch 16, bei dem die Funktl- 
onseinhert ferner eine Einrichtung zum Einstellen 25 
eines Ergebnisfelds auf 0 fur jedes undefinierte 
Feld aufweist. 

18. Prozessor nach einem der Anspriiche 7 bis 1 7, bei 
dem der Operand eine Bitbreite von 32 Bits aufweist so 
und von der Aufteilungseinrtchtung in drei Felder, 
einschlieBlich eines 16-Bit-Felds hoher Ordnung, 3. 
eines 8-Bit-FeIds mittlerer Ordnung und eines 8-Bit- 

Felds niedriger Ordnung. aufgeteift ist. 



Revendlcations 

1. Un proc6d6 pour ex6cuter des operations utilisant 

des op§randes de dimension variable dans un pro- 40 4. 
cesseur (10), le proc6d6 comprenant les stapes 
consistant k : 

partltionner des op^randes pleine dimension 
au niveau du bit en une plurality de champs 45 
d'op6rande (215,216, 217); 
designer chaque champ partltionn6 comme 
etant d^fini ou non d6fini relativement ^ une 
operation ; 5. 
pour chaque operation, determiner pour cha- so 
que champ d'opdrande ind^pendant des autres 
champs si les donndes de chaque champ 
d'op^rande qui sont utiiisees par I'operation 
sont d^pendantes d'un r^sultat non disponible 
d'une operation non ex§cut6e ; 55 s. 

ex^cuter {'operation utitisant les donn^es du 
champ d'operande si les donn6es dans tous les 
champs utilises ne sont pas dependantes d'un 



resultat non disponible et, sinon, attendre que 
des donnees dependantes contenues dans les 
champs deviennent disponibles, puis executer 
■'operation ; et 

faire suivre les donnees de resultat pour une 
utilisation par {'operation lorsque les donnees 
de resultat deviennent disponibles pourchacun 
des champs dependants independammentdes 
autres champs partition nes. 

Un precede se(on la revendicatton 1 , dans tequel 
retape de detenninatton de la dependence com- 
prend en outre les etapes consistant k : 

definir pour une operation un operande de des- 
tination et des champs d'operande de destina- 
tion ainsi qu'un operande source et des 
champs d'operande source ; 
stocker des identificateurs de I'operande de 
destination et des champs d'operande de des- 
tination pourplusieurs operations ; 
comparer les identificateurs de I'operande 
source -et les champs d'operande source aux 
operandes de destination et aux champs d'ope- 
rande de destination stockes ; et 
detenminer une dependence de donn6es {ors- 
que (es identificateurs d'operande source et de 
champ d'operande source correspondent aux 
identificateurs d'operande de destination et de 
champ d'operande de destination stockes. 

Un precede selon ta revendication 1 , dans lequel 
I'etape consistant k determiner si les donnees sont 
dependantes comprend en outre une etape pour : 
detecter tes dependances des donnees de 
chacun des champs d'operande definis, tndepen- 
damment des autres champs d'operande partition- 
nes. 

Un precede selon la revendication 3, comprenant 
en outre tes etapes consistant k ; 

executer une operation pour produire des don- 
nees de resultat ; et 

stocker les donnees de resultat dans un ele- 
ment de memoire pamii une pluralite d*eie- 
ments de memoire (62). 

Un procede selon la revendication 4, comprenant 
en outre retape consistant k utiliser des donnees 
stockeesdans un element de memoire pour chaque 
champ non dependant independamment des 
autres champs. 

Un precede selon la revendication 5, dans lequel 
des operandes sont des operandes source qui sont 
exploites par une operation et des operandes de 
destination qui sont produits par une operation, 
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comprenant en outre les Stapes consistent k : 

6tiqueter chaque champ d^fini d'un op^rande 
de destination d'une premiere operation ind^- 
pendamnnent des autres champs avec une 6ti- 5 
quette de destination qui identlfie un 6l6ment 
de m^moire qui revolt les donn^es de r^suttat 
de l'op6ration ; et 

4tiqueter chaque champ d6finl d'un op^rande 
source d'une seconde operation ind6pendam- io 
ment des autres champs avec une Etiquette de 
source qui identlfie un didment de m^moire qui 
foumit les donn^es d'op6rande de {'operation ; 

dans lequel I'^tape de detection de d6pen- is 
dance inclut les Stapes consistant k : 

comparer I'^tiquette de destination k {'Etiquette 
de source, et 

activer l'§tape de transmission lorsque les 6ti- 20 
quettes correspondent mutuellement pour cha- 
que champ d^fini ind^pendamment des autres 
champs. 

Un appareil de traitement de donnees pour un pro- 25 
cesseur (10) qui execute des operations utillsant 
des operandes d'une taille variable, 1' appareil 
comprenant : 

des moyens (1 8) pour partitionner un operande so 
utilise par une operation dans une plurality de 
champs (215, 216, 217); 
des moyens (36, 37) qui reagissent aux 
moyens de partitlonnement pour designer cha- 
que champ partitlonnS comme d^flnl ou non d^- 35 
fini reiativement k reparation ; 
des moyens (26) qui reagissent aux moyens de 
designation pour detecter des ddpendances de 
donnees de chacun des champs definis inde- 
pendamment des autres champs partitionnds ; ^^o 
des moyens (44) qui reagissent aux moyens de 
detection de dependence pour faire suivre les 
donnees de resuttat pour une utilisation par 
I'operation lorsque les donnees de resultat de- 
viennent disponibles pour chacun des champs ^5 
dependants, independamment des autres 
champs partition nes ; et 
une unite fonctlonnelle (20, 21 , 22, 80) qui rea- 
git k la transmission des donnees de resuttat 
pour executer ('operation et produire un resul- so 
tat. 

Un appareil selon la revendlcatlon 7, comprenant 
en outre : 

une memoire (62) coupl6e k I'unite fonctlon- ss 
nelte comprenant des elements de memoire pour 
stocker des donnees de resultat. 



9. Un appareil selon la revendlcatlon 8, comprenant 
en outre : 

des moyens pour utiliser des donnees stoc- 
kees dans les elements de memoire pour chaque 
champ non dependant, independamment des 
autres champs. 

10. Un appareil selon la revendication 9, dans lequel 
des operandes comprennent des operandes sour- 
ce qui sont exploites par une operation et des ope- 
randes de destination qui sont prodults par une ope- 
ration, I'appareil comprenant en outre : 

des moyens (70) pour etiqueter chaque champ 
defini d'un operande de destination d'une pre- 
miere operation independamment des autres 
champs avec une etiquette de destination qui 
identlfie un element de m6moire qui revolt les 
donnees de resultat de I'operation ; et 
des moyens (76) pour etiqueter chaque champ 
defini d'un operande source d'une seconde 
operation independamment des autres 
champs avec une etiquette de source qui iden- 
tlfie un element de memoire qui fournit les don- 
nees d'operande de Toperation, 

dans lequel les moyens de detection de de- 
pendence comprennent : 

un comparateur (41, 42) qui compare I'eti- 
quette de destination k I'etiquette de source et ac- 
tive les moyens de transmission lorsque les etiquet- 
tes correspondent mutuellement pour chaque 
champ defini independamment des autres champs. 

11. Un processeur comprenant un appareil de traite- 
ment de donnees tel que decrit dans la revendica- 
tion 7, et : 

un decodeur d'instructions (18) comprenant les 
moyens pour partitionner un op6rande et les 
moyens pour designer chaque champ de 
I'operande ; 

un tampon de reapprovisionnement (26) cou- 
ple au decodeur d'instructions comprenant une 
memoire (74) stockant des donnees d'operan- 
de et les moyens pour detector les dependen- 
ces des donnees ; 

un bus (30, 31 , 32) couple au tampon de reep- 
provisionnement pour communiquer des don- 
nees d'operende pour runitefonctionnelle etent 
coupiee au bus. 

12. Un processeur selon la revendication 11, dans le- 
quel le bus comprend : 

un bus d'operande (30, 31) couple de la sortie 
du tampon de reapprovisionnement k I'entree 
de I'unite fonctlonnelle pour communiquer des 
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donn^es d'op^rande ; et 
un bus de r^sultat (32) coupl§ de la sortie de 
I'unitd fonctionnelle aux entries du tampon de 
r6approvisionnement et de runlt6 fonctionnelle 
pour communiquer des donn^es de r6sultat. 5 

13. Un processeur selon la revendication 12, dans le- 
quel le tampon de r^approvtsionnementcomprend : 

des moyens (70) pour affecter un identif icateur io 
d'6tiquette de destination qui d^signe un ^l§- 
ment de m6moire pour recevoir les donn^es de 
r^sultat de I'ex^cution de ('operation pour cha- 
que champ d'op^rande ind^pendarriment des 
autres champs ; is 
des moyens (76) pour affecter un identif icateur 
d'^tiquette d'operande qui d^signe un op^ran- 
de source d'une operation dependant de don- 
n^es pour chaque champ d'operande ind^pen- 
damment des autres champs, le processeur 20 
comprenant en outre : 

un bus d'^tiquette de designation (40) cou- 
ple de la sortie du tampon de.r6approvi- 
sionnement k I'entr^e de I'unite fonction- 25 
nelle pour transmettre I'identif icateur d'eti- 
quette de destination d'une operation an- 
terieure k son execution par une unite 
fonctionnelle ; 

un bus d'etiquette de resultat (39) couple 30 
de la sortie de I'unite fonctionnelle aux en- 
trees de I'unite fonctionnelle et du tampon 
de reapprovisionnement pour transmettre 
ridentificateur d'etiquette de destination 
d'une operation anterieure k son execution 35 
par une unite fonctionnelle ; 
un bus d'etiquette d'operande (48, 49) cou- 
ple de la sortie du tampon de reapprovi- 
sionnement ^ I'entree de I'unite fonction- 
nelle pour transmettre I'identif icateur d'eti- 40 
quette d'operande d'une operation depen- 
dant de donnees ; et 

une station de reservation (44) associee k 
I'unite fonctionnelle et coupiee au bus d'eti- 
quette d'operande, au bus d'etiquette de 
resultat et au bus de resultat, la station de 
reservation comprenant : 

un comparateur (81) qui compare un 
Identificateur du bus d'etiquette d'ope- so 
rande k un identificateur du bus d'eti- 
quette de resultat pour chaque champ 
defini independamment des autres 
champs ; et 

un circuit de transfert (90, 92) couple ss 
au bus de resultat et au comparateur 
et qui reagit aux identif icateurs d'eti- 
quette de resultat et d'etiquette d'ope- 



rande con^espondant mutuellement, 
pour transmettre des donnees de re- 
sultat pour chaque champ defini inde- 
pendamment des autres champs. 

14. Un processeur selon la revendication 13, compre- 
nant en outre : 

un fichier de registre (24) couple au tampon 
de reapprovisionnement et au bus d'operande et 
possedant une memoire (62) dans lequel : 

la memoire du tampon de reapprovisionne- 
ment stocke des resultats d'operation speculatifs, 
et le fichier de registre stocke les resultats d'opera- 
tion depuis la memoire du tampon de reapprovi- 
sionnement lorsque les operations deviennent non 
speculatives. 

15. Un processeur selon la revendication 14, compre- 
nant en outre : 

un pilote de bus de tampon de reapprovision- 
nement (76) comprenant des moyens pour pl- 
loter des donnees depuis la memoire du tam- 
pon de reapprovisionnement dans un bus 
d'operande pour chaque champ ayant des don- 
nees speculatives disponibles independam- 
ment des autres champs ; et 
un pilote de bus de fichier de registre (68) com- 
prenant des moyens pour piioter des donnees 
depuis la memoire du fichier de registre dans 
un bus d'operande pour les champs restants 
autres que ceux pilotes par le pilote de bus de 
reapprovisionnement. 

16. Un processeur selon la revendication 15, compre- 
nant en outre : 

un pilote de bus de resultat (93) couple k I'uni- 
te fonctionnelle et qui reagit k I'execution d'une ope- 
ration pour piioter des donnees de resultat depuis 
I'unite fonctionnelle dans un bus de resultat. 

17. Un processeur selon la revendication 16, dans le- 
quel {'unite fonctionnelle comprend en outre des 
moyens pour definir un champ de resultat k 0 pour 
chaque champ non defini. 

18. Un processeur selon Tune quelconque des reven- 
dlcations 7^17, dans lequel I'operande a une lar- 
geur de bits egale k 32 bits et est partitionne par les 
moyens de partitionnement en trois champs com- 
prenant un champ 16 bits de gauche, un champ 8 
bits central et un champ 8 bits de droite. 
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